{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SELFIES Examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "uTkl9w4PBc-E",
    "outputId": "8b789a83-bf66-4902-d77a-76c2d1a1d8f6",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('../')\n",
    "\n",
    "import selfies as sf\n",
    "from rdkit import Chem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tM3wFk1e_COd"
   },
   "source": [
    "## Standard Usage\n",
    "First let's try translating from SMILES to SELFIES, and then from SELFIES to SMILES. We will use a non-fullerene acceptor for organic solar cells as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 89
    },
    "colab_type": "code",
    "id": "GH0DQxBN_Fei",
    "outputId": "56aa043e-df48-4081-f938-49711a166d33"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original SMILES: CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1\n",
      "Translated SELFIES: [C][N][C][Branch1_2][C][=O][C][=C][Branch2_1][Ring2][Branch1_3][C][=C][C][=C][Branch1_1][Ring2][S][Ring1][Branch1_1][C][S][C][Branch1_1][N][C][=N][C][=C][Branch1_1][Ring1][C][#N][S][Ring1][Branch1_3][=C][C][Expl=Ring1][N][C][Ring1][S][O][C][C][O][Ring1][Branch1_1][N][Branch1_1][C][C][C][Branch1_2][C][=O][C][Ring2][Ring1][=N][=C][Ring2][Ring1][P][C][=C][C][=C][Branch1_1][Ring2][S][Ring1][Branch1_1][C][S][C][Branch1_1][N][C][=N][C][=C][Branch1_1][Ring1][C][#N][S][Ring1][Branch1_3][=C][C][Expl=Ring1][N][C][Ring1][S][O][C][C][O][Ring1][Branch1_1]\n",
      "Translated SMILES: CN7C(=O)C6=C(C1=CC4=C(S1)C=3SC(C2=NC=C(C#N)S2)=CC=3C45OCCO5)N(C)C(=O)C6=C7C8=CC%11=C(S8)C=%10SC(C9=NC=C(C#N)S9)=CC=%10C%11%12OCCO%12\n"
     ]
    }
   ],
   "source": [
    "smiles = \"CN1C(=O)C2=C(c3cc4c(s3)-c3sc(-c5ncc(C#N)s5)cc3C43OCCO3)N(C)C(=O)\" \\\n",
    "         \"C2=C1c1cc2c(s1)-c1sc(-c3ncc(C#N)s3)cc1C21OCCO1\"\n",
    "encoded_selfies = sf.encoder(smiles)  # SMILES --> SEFLIES\n",
    "decoded_smiles = sf.decoder(encoded_selfies)  # SELFIES --> SMILES\n",
    "\n",
    "print(f\"Original SMILES: {smiles}\")\n",
    "print(f\"Translated SELFIES: {encoded_selfies}\")\n",
    "print(f\"Translated SMILES: {decoded_smiles}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Ng8PmMiB_RvJ"
   },
   "source": [
    "When comparing the original and decoded SMILES, do not use `==` equality. Use RDKit to check whether both SMILES correspond to the same molecule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "iAc5FVrP_XV6",
    "outputId": "b503f896-a2a0-46a6-fc5b-9c474c01ba62"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "== Equals: False\n",
      "RDKit Equals: True\n"
     ]
    }
   ],
   "source": [
    "print(f\"== Equals: {smiles == decoded_smiles}\")\n",
    "\n",
    "# Recomended \n",
    "can_smiles = Chem.CanonSmiles(smiles)\n",
    "can_decoded_smiles = Chem.CanonSmiles(decoded_smiles)\n",
    "print(f\"RDKit Equals: {can_smiles == can_decoded_smiles}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "IKfNr5m6_h4f"
   },
   "source": [
    "## Advanced Usage\n",
    "Now let's try to customize the SELFIES constraints. We will first look at the default SELFIES semantic constraints. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 200
    },
    "colab_type": "code",
    "id": "Xmce7wvV_t4Y",
    "outputId": "8b10af2f-486e-4910-8a71-055b59a09746"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Default Constraints:\n",
      " {'H': 1, 'F': 1, 'Cl': 1, 'Br': 1, 'I': 1, 'O': 2, 'N': 3, 'C': 4, 'P': 5, 'S': 6, '?': 8}\n"
     ]
    }
   ],
   "source": [
    "default_constraints = sf.get_semantic_constraints()\n",
    "print(f\"Default Constraints:\\n {default_constraints}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HJNWkTJDADui"
   },
   "source": [
    "We have two compounds here, CS=CC#S and \\[Li\\]=CC in SELFIES form. Under the default SELFIES settings, they are translated like so. Note that since Li is not recognized by SELFIES, it is constrained to 8 bonds by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "2rQQuBUKAKYE"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CS=CC#S --> CS=CC#S\n",
      "[Li]=CC --> [Li]=CC\n"
     ]
    }
   ],
   "source": [
    "c_s_compound = sf.encoder(\"CS=CC#S\")\n",
    "li_compound = sf.encoder(\"[Li]=CC\")\n",
    "\n",
    "print(f\"CS=CC#S --> {sf.decoder(c_s_compound)}\")\n",
    "print(f\"[Li]=CC --> {sf.decoder(li_compound)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KevVGyIEAVlu"
   },
   "source": [
    "We can add Li to the SELFIES constraints, and restrict it to 1 bond only. We can also restrict S to 2 bonds (instead of its default 6). After setting the new constraints, we can check to see if they were updated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "y5EbmzkKATkD"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Updated Constraints:\n",
      " {'H': 1, 'F': 1, 'Cl': 1, 'Br': 1, 'I': 1, 'O': 2, 'N': 3, 'C': 4, 'P': 5, 'S': 2, '?': 8, 'Li': 1}\n"
     ]
    }
   ],
   "source": [
    "new_constraints = default_constraints\n",
    "new_constraints['Li'] = 1\n",
    "new_constraints['S'] = 2\n",
    "\n",
    "sf.set_semantic_constraints(new_constraints)  # update constraints \n",
    "\n",
    "print(f\"Updated Constraints:\\n {sf.get_semantic_constraints()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "e_djGmr_AvM7"
   },
   "source": [
    "Under our new settings, our previous molecules are translated like so. Notice that our new semantic constraints are met."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "TCzbjZMAAxpo"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CS=CC#S --> CSCC=S\n",
      "[Li]=CC --> [Li]CC\n"
     ]
    }
   ],
   "source": [
    "print(f\"CS=CC#S --> {sf.decoder(c_s_compound)}\")\n",
    "print(f\"[Li]=CC --> {sf.decoder(li_compound)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Ng1Lr_e6A3cB"
   },
   "source": [
    "To revert back to the default constraints, simply call: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "zwC00Rx5A6eQ"
   },
   "outputs": [],
   "source": [
    "sf.set_semantic_constraints()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "selfies_example.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
